rfc(decision): Distroless base images#157
Conversation
21cb505 to
75cc51e
Compare
|
|
||
| # Resolved questions | ||
|
|
||
| - **Long-term commitment to DHI:** Despite Docker Inc having a history of unexpected licensing and policy changes (Hub rate limiting, Desktop licensing, etc.), DHI was recently made public under Apache 2.0, and a rollback of that decision seems unlikely. If needed, Google Distroless is a practical drop-in fallback — it lags a few patch versions behind but is otherwise compatible. Other solutions may also emerge over time. We can go with DHI images as a default. |
There was a problem hiding this comment.
Do you think we would be able to maintain the images if we had to?
There was a problem hiding this comment.
While we really don't want to do this (as it's not Sentry business) but I tried doing this literally from scratch in a couple of hours, and drop-in replacement of the base image worked fine to the point of passing all smoke tests on snuba:
- https://github.com/oioki/python-base-image (own base images from scratch)
- build: try ghcr.io/oioki/python-base-image for distroless targets snuba#7959 (testing own base images)
So, if we had to, it is doable nowadays.
| - **Snuba and getsentry:** These are the largest remaining Python services. The Snuba PoC (https://github.com/getsentry/snuba/pull/7753, https://github.com/getsentry/snuba/pull/7821, https://github.com/getsentry/snuba/pull/7829, https://github.com/getsentry/ops/pull/19824) showed it is feasible. What is the sequencing and who owns driving this to completion? | ||
| - **Local development compatibility:** Are there any blockers that might disrupt local development workflows when switching to distroless? So far this appears to be a non-issue — for example, Snuba distroless containers work fine in `sentry devservices` (https://github.com/getsentry/snuba/pull/7829). | ||
| - **Services with non-trivial runtime deps:** Some services (e.g. uptime-checker with OpenSSL for certificate validation, or services using external libraries) may need extra work. Are there any blockers that make distroless infeasible for them? | ||
| - **Public mirrors for anonymous access:** Pulling directly from `dhi.io` requires a Docker login, which complicates CI pipelines and local image builds for contributors. Should we commit to maintaining public mirrors at `ghcr.io/getsentry/dhi` to allow unauthenticated pulls? See current PoC: https://github.com/getsentry/dhi. |
There was a problem hiding this comment.
Needing to login could be disruptive to self-hosted users.
There was a problem hiding this comment.
We are going to mirror the limited set of images (for start, python and node) on artifact registries controlled by us: either GHCR (could be more native for self-hosted) or Public Artifact Registry in GCP (could be faster for SaaS builds?). In both cases, pulling base images from those won't require authentication.
| Distroless containers have no shell. You cannot `exec` into a running container and run arbitrary commands. Debugging requires: | ||
|
|
||
| - Attaching an ephemeral debug container with a shell to the running pod (e.g. [`sentry-kube debug`](https://github.com/getsentry/sentry-infra-tools/blob/main/sentry_kube/cli/debug.py)) | ||
| - Using application-level tooling (e.g. interactive shells provided by the framework) rather than OS-level tools, e.g. `getsentry shell` | ||
| - Investing in proper observability (logs, metrics, tracing) instead of ad-hoc inspection |
There was a problem hiding this comment.
I'd like us to put in some effort ahead of time to validate that the debugging flow is very smooth - we've definitely run into various issues attempting to attach debugger pods in some of the places we've already swapped out more minimal images.
There was a problem hiding this comment.
sentry-kube debug is a full replacement of the kubectl exec workflow. I'd even consider it a better experience as it allows elevating permissions to allow attaching runtime profilers, which isn't possible with just kubectl exec.
There was a problem hiding this comment.
+1, our existing application images are already very slim, so exec is not very useful. debug is a better experience today, and anytime I use exec, i immediately go into getsentry shell.
There was a problem hiding this comment.
I guess the only downside for debugging is docker-compose in dev setups and self-hosted where sentry-kube debug doesn't work.
There was a problem hiding this comment.
sentry uses /dev/shm for multiprocess IPC, and /tmp for random stuff related to release artifacts and other file uploads.
some of these images explicitly say that /tmp is "hardened" (so basically unusuable). while we generally mount tmpfs on /tmp into the container (making this a non-issue), i'm not sure that we do it consistently in all pods that require it. also no idea if self-hosted has this kind of setup at all.
it may be desirable to enforce in the ops repo that tmpfs is mounted in absolutely every container in /dev/shm and /tmp, rather than relying on smoke tests. we have had incidents where new deployments were missing those mounts leading to really weird issues in multiprocess consumers, so some systematic enforcement would be nice to have for other reasons.
| Distroless containers have no shell. You cannot `exec` into a running container and run arbitrary commands. Debugging requires: | ||
|
|
||
| - Attaching an ephemeral debug container with a shell to the running pod (e.g. [`sentry-kube debug`](https://github.com/getsentry/sentry-infra-tools/blob/main/sentry_kube/cli/debug.py)) | ||
| - Using application-level tooling (e.g. interactive shells provided by the framework) rather than OS-level tools, e.g. `getsentry shell` | ||
| - Investing in proper observability (logs, metrics, tracing) instead of ad-hoc inspection |
There was a problem hiding this comment.
+1, our existing application images are already very slim, so exec is not very useful. debug is a better experience today, and anytime I use exec, i immediately go into getsentry shell.
Rendered RFC